Search CORE

55 research outputs found

Deep Knowledge Tracing is an implicit dynamic multidimensional item response theory model

Author: Kashima Hisashi
Vie Jill-Jênn
Publication venue
Publication date: 18/08/2023
Field of study

Knowledge tracing consists in predicting the performance of some students on new questions given their performance on previous questions, and can be a prior step to optimizing assessment and learning. Deep knowledge tracing (DKT) is a competitive model for knowledge tracing relying on recurrent neural networks, even if some simpler models may match its performance. However, little is known about why DKT works so well. In this paper, we frame deep knowledge tracing as a encoderdecoder architecture. This viewpoint not only allows us to propose better models in terms of performance, simplicity or expressivity but also opens up promising avenues for future research directions. In particular, we show on several small and large datasets that a simpler decoder, with possibly fewer parameters than the one used by DKT, can predict student performance better.Comment: ICCE 2023 - The 31st International Conference on Computers in Education, Asia-Pacific Society for Computers in Education, Dec 2023, Matsue, Shimane, Franc

arXiv.org e-Print Archive

Privacy-Preserving Synthetic Educational Data Generation

Author: Minn Sein
Rigaux Tomas
Vie Jill-Jênn
Publication venue: HAL CCSD
Publication date: 12/09/2022
Field of study

International audienceInstitutions collect massive learning traces but they may not disclose it for privacy issues. Synthetic data generation opens new opportunities for research in education. In this paper we present a generative model for educational data that can preserve the privacy of participants, and an evaluation framework for comparing synthetic data generators. We show how naive pseudonymization can lead to re-identification threats and suggest techniques to guarantee privacy. We evaluate our method on existing massive educational open datasets

INRIA a CCSD electronic archive server

Variational Factorization Machines for Preference Elicitation in Large-Scale Recommender Systems

Author: Kashima Hisashi
Rigaux Tomas
Vie Jill-Jênn
Publication venue: HAL CCSD
Publication date: 17/12/2022
Field of study

International audienceFactorization machines (FMs) are a powerful tool for regression and classification in the context of sparse observations, that has been successfully applied to collaborative filtering, especially when side information over users or items is available. Bayesian formulations of FMs have been proposed to provide confidence intervals over the predictions made by the model, however they usually involve Markov-chain Monte Carlo methods that require many samples to provide accurate predictions, resulting in slow training in the context of largescale data. In this paper, we propose a variational formulation of factorization machines that allows us to derive a simple objective that can be easily optimized using standard mini-batch stochastic gradient descent, making it amenable to large-scale data. Our algorithm learns an approximate posterior distribution over the user and item parameters, which leads to confidence intervals over the predictions. We show, using several datasets, that it has comparable or better performance than existing methods in terms of prediction accuracy, and provide some applications in active learning strategies, e.g., preference elicitation techniques

INRIA a CCSD electronic archive server

Using Posters to Recommend Anime and Mangas in a Cold-Start Scenario

Author: Chalumeau Thomas
Clement Basile
Cocchi Kévin
Kashima Hisashi
Lahfa Ryan
Vie Jill-Jênn
Yger Florian
Publication venue
Publication date: 07/09/2017
Field of study

Item cold-start is a classical issue in recommender systems that affects anime and manga recommendations as well. This problem can be framed as follows: how to predict whether a user will like a manga that received few ratings from the community? Content-based techniques can alleviate this issue but require extra information, that is usually expensive to gather. In this paper, we use a deep learning technique, Illustration2Vec, to easily extract tag information from the manga and anime posters (e.g., sword, or ponytail). We propose BALSE (Blended Alternate Least Squares with Explanation), a new model for collaborative filtering, that benefits from this extra information to recommend mangas. We show, using real data from an online manga recommender system called Mangaki, that our model improves substantially the quality of recommendations, especially for less-known manga, and is able to provide an interpretation of the taste of the users.Comment: 6 pages, 3 figures, 1 table, accepted at the MANPU 2017 workshop, co-located with ICDAR 2017 in Kyoto on November 10, 201

arXiv.org e-Print Archive

Evaluating DAS3H on the EdNet Dataset

Author: Bourda Yolaine
Choffin Benoît
Popineau Fabrice
Vie Jill-Jênn
Publication venue: HAL CCSD
Publication date: 09/02/2021
Field of study

International audienceThe EdNet dataset is a massive English language dataset that poses unique challenges for student performance prediction. In this paper, we describe and comment the results of our award-winning model DAS3H in the context of knowledge tracing in EdNet

INRIA a CCSD electronic archive server

Interpretable Knowledge Tracing: Simple and Efficient Student Modeling with Causal Relations

Author: Kashima Hisashi
Minn Sein
Takeuchi Koh
Vie Jill-Jênn
Zhu Feida
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 15/12/2021
Field of study

International audienceIntelligent Tutoring Systems have become critically important in future learning environments. Knowledge Tracing (KT) is a crucial part of that system. It is about inferring the skill mastery of students and predicting their performance to adjust the curriculum accordingly. Deep Learning-based KT models have shown significant predictive performance compared with traditional models. However, it is difficult to extract psychologically meaningful explanations from the tens of thousands of parameters in neural networks, that would relate to cognitive theory. There are several ways to achieve high accuracy in student performance prediction but diagnostic and prognostic reasoning are more critical in learning sciences. Since KT problem has few observable features (problem ID and student's correctness at each practice), we extract meaningful latent features from students' response data by using machine learning and data mining techniques. In this work, we present Interpretable Knowledge Tracing (IKT), a simple model that relies on three meaningful latent features: individual skill mastery, ability profile (learning transfer across skills) and problem difficulty. IKT's prediction of future student performance is made using a Tree-Augmented Naive Bayes Classifier (TAN), therefore its predictions are easier to explain than deep learning-based student models. IKT also shows better student performance prediction than deep learning-based student models without requiring a huge amount of parameters. We conduct ablation studies on each feature to examine their contribution to student performance prediction. Thus, IKT has great potential for providing adaptive and personalized instructions with causal reasoning in real-world educational systems

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Institutional Knowledge at Singapore Management University

Association for the Advancement of Artificial Intelligence: AAAI Publications

Cognitive diagnostic computerized adaptive testing models for large-scale learning

Author: Vie Jill-Jênn
Publication venue
Publication date: 05/12/2016
Field of study

Cette thèse porte sur les tests adaptatifs dans les environnements d’apprentissage. Elle s’inscrit dans les contextes de fouille de données éducatives et d’analytique de l’apprentissage, où l’on s’intéresse à utiliser les données laissées par les apprenants dans des environnements éducatifs pour optimiser l’apprentissage au sens large.L’évaluation par ordinateur permet de stocker les réponses des apprenants facilement, afin de les analyser et d’améliorer les évaluations futures. Dans cette thèse, nous nous intéressons à un certain type de test par ordinateur, les tests adaptatifs. Ceux-ci permettent de poser une question à un apprenant, de traiter sa réponse à la volée, et de choisir la question suivante à lui poser en fonction de ses réponses précédentes. Ce processus réduit le nombre de questions à poser à un apprenant tout en conservant une mesure précise de son niveau. Les tests adaptatifs sont aujourd’hui implémentés pour des tests standardisés tels que le GMAT ou le GRE, administrés à des centaines de milliers d’étudiants. Toutefois, les modèles de tests adaptatifs traditionnels se contentent de noter les apprenants, ce qui est utile pour l’institution qui évalue, mais pas pour leur apprentissage. C’est pourquoi des modèles plus formatifs ont été proposés, permettant de faire un retour plus riche à l’apprenant à l’issue du test pour qu’il puisse comprendre ses lacunes et y remédier. On parle alors de diagnostic adaptatif.Dans cette thèse, nous avons répertorié des modèles de tests adaptatifs issus de différents pans de la littérature. Nous les avons comparés de façon qualitative et quantitative. Nous avons ainsi proposé un protocole expérimental, que nous avons implémenté pour comparer les principaux modèles de tests adaptatifs sur plusieurs jeux de données réelles. Cela nous a amenés à proposer un modèle hybride de diagnostic de connaissances adaptatif, meilleur que les modèles de tests formatifs existants sur tous les jeux de données testés. Enfin, nous avons élaboré une stratégie pour poser plusieursquestions au tout début du test afin de réaliser une meilleure première estimation des connaissances de l’apprenant. Ce système peut être appliqué à la génération automatique de feuilles d’exercices, par exemple sur un cours en ligne ouvert et massif (MOOC).This thesis studies adaptive tests within learning environments. It falls within educational data mining and learning analytics, where student educational data is processed so as to optimize their learning.Computerized assessments allow us to store and analyze student data easily, in order to provide better tests for future learners. In this thesis, we focus on computerized adaptive testing. Such adaptive tests which can ask a question to the learner, analyze their answer on the fly, and choose the next question to ask accordingly. This process reduces the number of questions to ask to a learner while keeping an accurate measurement of their level. Adaptive tests are today massively used in practice, for example in the GMAT and GRE standardized tests, that are administered to hundreds of thousands of students. Traditionally, models used for adaptive assessment have been mostly summative : they measure or rank effectively examinees, but do not provide any other feedback. Recent advances have focused on formative assessments, that provide more useful feedback for both the learner and the teacher ; hence, they are more useful for improving student learning.In this thesis, we have reviewed adaptive testing models from various research communities. We have compared them qualitatively and quantitatively. Thus, we have proposed an experimental protocol that we have implemented in order to compare the most popular adaptive testing models, on real data. This led us to provide a hybrid model for adaptive cognitive diagnosis, better than existing models for formative assessment on all tried datasets. Finally, we have developed a strategy for asking several questions at the beginning of a test in order to measure the learner more accurately. This system can be applied to the automatic generation of worksheets, for example on a massive online open course (MOOC)

Theses.fr

Un algorithme de composition musicale

Author: Jill-Jênn Vie
Publication venue: 'EDP Sciences'
Publication date: 25/03/2009
Field of study

Une des nombreuses applications des chaînes de Markov est de générer du texte de manière aléatoire. Après une courte description du langage PMX, servant à écrire des partitions, cet article propose un algorithme analogue pour générer de la musique. Les sections suivantes expliqueront comment calculer le nombre moyen de mesures générées par le compositeur, et exposeront diverses manières de l'améliorer

EDP Sciences OAI-PMH repository (1.2.0)

Modèles de tests adaptatifs pour le diagnostic de connaissances dans un cadre d'apprentissage à grande échelle

Author: Vie Jill-Jênn
Publication venue: HAL CCSD
Publication date: 05/12/2016
Field of study

This thesis studies adaptive tests within learning environments. It falls within educational data mining and learning analytics, where student educational data is processed so as to optimize their learning.Computerized assessments allow us to store and analyze student data easily, in order to provide better tests for future learners. In this thesis, we focus on computerized adaptive testing. Such adaptive tests which can ask a question to the learner, analyze their answer on the fly, and choose the next question to ask accordingly. This process reduces the number of questions to ask to a learner while keeping an accurate measurement of their level. Adaptive tests are today massively used in practice, for example in the GMAT and GRE standardized tests, that are administered to hundreds of thousands of students. Traditionally, models used for adaptive assessment have been mostly summative : they measure or rank effectively examinees, but do not provide any other feedback. Recent advances have focused on formative assessments, that provide more useful feedback for both the learner and the teacher ; hence, they are more useful for improving student learning.In this thesis, we have reviewed adaptive testing models from various research communities. We have compared them qualitatively and quantitatively. Thus, we have proposed an experimental protocol that we have implemented in order to compare the most popular adaptive testing models, on real data. This led us to provide a hybrid model for adaptive cognitive diagnosis, better than existing models for formative assessment on all tried datasets. Finally, we have developed a strategy for asking several questions at the beginning of a test in order to measure the learner more accurately. This system can be applied to the automatic generation of worksheets, for example on a massive online open course (MOOC).Cette thèse porte sur les tests adaptatifs dans les environnements d’apprentissage. Elle s’inscrit dans les contextes de fouille de données éducatives et d’analytique de l’apprentissage, où l’on s’intéresse à utiliser les données laissées par les apprenants dans des environnements éducatifs pour optimiser l’apprentissage au sens large.L’évaluation par ordinateur permet de stocker les réponses des apprenants facilement, afin de les analyser et d’améliorer les évaluations futures. Dans cette thèse, nous nous intéressons à un certain type de test par ordinateur, les tests adaptatifs. Ceux-ci permettent de poser une question à un apprenant, de traiter sa réponse à la volée, et de choisir la question suivante à lui poser en fonction de ses réponses précédentes. Ce processus réduit le nombre de questions à poser à un apprenant tout en conservant une mesure précise de son niveau. Les tests adaptatifs sont aujourd’hui implémentés pour des tests standardisés tels que le GMAT ou le GRE, administrés à des centaines de milliers d’étudiants. Toutefois, les modèles de tests adaptatifs traditionnels se contentent de noter les apprenants, ce qui est utile pour l’institution qui évalue, mais pas pour leur apprentissage. C’est pourquoi des modèles plus formatifs ont été proposés, permettant de faire un retour plus riche à l’apprenant à l’issue du test pour qu’il puisse comprendre ses lacunes et y remédier. On parle alors de diagnostic adaptatif.Dans cette thèse, nous avons répertorié des modèles de tests adaptatifs issus de différents pans de la littérature. Nous les avons comparés de façon qualitative et quantitative. Nous avons ainsi proposé un protocole expérimental, que nous avons implémenté pour comparer les principaux modèles de tests adaptatifs sur plusieurs jeux de données réelles. Cela nous a amenés à proposer un modèle hybride de diagnostic de connaissances adaptatif, meilleur que les modèles de tests formatifs existants sur tous les jeux de données testés. Enfin, nous avons élaboré une stratégie pour poser plusieursquestions au tout début du test afin de réaliser une meilleure première estimation des connaissances de l’apprenant. Ce système peut être appliqué à la génération automatique de feuilles d’exercices, par exemple sur un cours en ligne ouvert et massif (MOOC)

HAL-CentraleSupelec

Thèses en Ligne

HAL-Rennes 1